-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cp: Implement --sparse
flag
#3766
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sylvestre
reviewed
Aug 2, 2022
src/uu/cp/src/cp.rs
Outdated
sparse_mode: SparseMode, | ||
context: &str, | ||
) -> CopyResult<()> { | ||
if (reflink_mode != ReflinkMode::Never) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
if (reflink_mode != ReflinkMode::Never) { | |
if reflink_mode != ReflinkMode::Never { |
sylvestre
reviewed
Aug 2, 2022
sylvestre
reviewed
Aug 2, 2022
pimzero
force-pushed
the
3362-sparse-clean
branch
2 times, most recently
from
August 2, 2022 16:50
7fd289f
to
34b6cbc
Compare
sylvestre
reviewed
Aug 2, 2022
pimzero
force-pushed
the
3362-sparse-clean
branch
from
August 2, 2022 20:17
5be06c8
to
53ae311
Compare
Small conflict :) |
`--sparse` and `--reflink` options have a lot of similarities: - They have similar options (`always`, `never`, `auto`) - Both need OS specific handling - They can be mutually exclusive Prior to this change, `sparse` was defined as `CopyMode`, but `reflink` wasn't. Given the similarities, it makes sense to handle them similarly. The idea behind this change is to move all OS specific file copy handling in the `copy_on_write_*` functions. Those function then dispatch to the correct logic depending on the arguments (at the moment, the tuple `(reflink, sparse)`). Also, move the handling of `--reflink=never` from `copy_file` to the `copy_on_write_*` functions, at the cost of a bit of code duplication, to allow `copy_on_write_*` to handle all cases (and later handle `--reflink=never` with `--sparse`).
This begins to address uutils#3362 At the moment, only the `--sparse=always` logic matches the requirement form GNU cp info page, i.e. always make holes in destination when possible. Sparse copy is done by copying the source to the destination block by block (blocks being of the destination's fs block size). If the block only holds NUL bytes, we don't write to the destination. About `--sparse=auto`: according to GNU cp info page, the destination file will be made sparse if the source file is sparse as well. The next step are likely to use `lseek` with `SEEK_HOLE` detect if the source file has holes. Currently, this has the same behaviour as `--sparse=never`. This `SEEK_HOLE` logic can also be applied to `--sparse=always` to improve performance when copying sparse files. About `--sparse=never`: from my understanding, it is not guaranteed that Rust's `fs::copy` will always produce a file with no holes, as ["platform-specific behavior may change in the future"](https://doc.rust-lang.org/std/fs/fn.copy.html#platform-specific-behavior) About other platforms: - `macos`: The solution may be to use `fcntl` command `F_PUNCHHOLE`. - `windows`: I only see `FSCTL_SET_SPARSE`. This should pass the following GNU tests: - `tests/cp/sparse.sh` - `tests/cp/sparse-2.sh` - `tests/cp/sparse-extents.sh` - `tests/cp/sparse-extents-2.sh` `sparse-perf.sh` needs `--sparse=auto`, and in particular a way to skip holes in the source file.
pimzero
force-pushed
the
3362-sparse-clean
branch
from
August 3, 2022 15:55
53ae311
to
b8fc067
Compare
Indeed, bravo :)
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This begins to address #3362 (the logic behind the
--sparse=always
flag, for linux only at the moment)This PR does 2 things:
--sparse
code path to ~match--reflink
one--sparse=always
+ handling of--sparse
with--reflink
The
--sparse=auto
needs a bit more logic (as documented in the commit message).I think all the information needed (rationale, logic, future directions) is already explained in the commit, so copying them here:
1 - cp: Refactor
reflink
/sparse
handling to enable--sparse
flag--sparse
and--reflink
options have a lot of similarities:always
,never
,auto
)Prior to this change,
sparse
was defined asCopyMode
, butreflink
wasn't. Given the similarities, it makes sense to handle them similarly.The idea behind this change is to move all OS specific file copy handling in the
copy_on_write_*
functions. Those function then dispatch to the correct logic depending on the arguments (at the moment, the tuple(reflink, sparse)
).Also, move the handling of
--reflink=never
fromcopy_file
to thecopy_on_write_*
functions, at the cost of a bit of code duplication, to allowcopy_on_write_*
to handle all cases (and later handle--reflink=never
with--sparse
).2 - cp: Implement
--sparse
flagThis begins to address #3362
At the moment, only the
--sparse=always
logic matches the requirement form GNU cp info page, i.e. always make holes in destination when possible.Sparse copy is done by copying the source to the destination block by block (blocks being of the destination's fs block size). If the block only holds NUL bytes, we don't write to the destination.
About
--sparse=auto
: according to GNU cp info page, the destination file will be made sparse if the source file is sparse as well. The next step are likely to uselseek
withSEEK_HOLE
detect if the source file has holes. Currently, this has the same behaviour as--sparse=never
. ThisSEEK_HOLE
logic can also be applied to--sparse=always
to improve performance when copying sparse files.About
--sparse=never
: from my understanding, it is not guaranteed that Rust'sfs::copy
will always produce a file with no holes, as "platform-specific behavior may change in the future"About other platforms:
macos
: The solution may be to usefcntl
commandF_PUNCHHOLE
.windows
: I only seeFSCTL_SET_SPARSE
.This should pass the following GNU tests:
tests/cp/sparse.sh
tests/cp/sparse-2.sh
tests/cp/sparse-extents.sh
tests/cp/sparse-extents-2.sh
sparse-perf.sh
needs--sparse=auto
, and in particular a way to skip holes in the source file.